home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-09-23 | 85.5 KB | 1,477 lines |
- Internet-Draft <draft-chiappa-routing-01.txt> Expires on April 1994
-
-
- A New IP Routing and Addressing Architecture
-
- J. Noel Chiappa
-
-
-
- [ Author's note: This document is not yet fully complete, but it is being
- distributed in this rough form since there appears to be a considerable
- appetite in the network community for discussion on this topic. It is hoped
- that this note, even in its present form, will facilitate useful debate about
- the issue. ]
-
- Status of the Memo
-
- This document is an Internet Draft. Internet Drafts are working
- documents of the Internet Engineering Task Force (IETF), its Areas,
- and its Working Groups. Note that other groups may also distribute
- working documents as Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six
- months. Internet Drafts may be updated, replaced, or obsoleted
- by other documents at any time. It is not appropriate to use
- Internet Drafts as reference material or to cite them other than
- as a "working draft" or "work in progress."
-
- Please check the I-D abstract listing contained in each Internet
- Draft directory to learn the current status of this or any
- other Internet Draft.
-
-
- 1 Introduction
-
- This document presents one possible new IP routing and adressing
- architecture. By routing and addressing it is meant that part of the overall
- IP architecture which is responsible for identifying computing nodes,
- where they are in the Internet, and how to get traffic from one to another. It
- represents one person's view of a good overall system answer to this question,
- and is not to be taken as anything more than that.
- While this document will be of most interest to those with some degree
- of familiarity with routing concepts, it is aimed at the general network
- community, which will have to agree with the general direction taken in
- evolving this aspect of the IP architecture. While the subject is a complex
- one, the document has been written to be accessible to a general audience. As
- such, it contains much material which will be familiar to those who are versed
- in routing work, which has unavoidably lengthened the document.
-
- The goals of the new architecture are more or less those outlined in
- RFC1126 [Little], particularly Appendix A, but modified somewhat to include
- what I feel are real constraints, such as more complex external policy
- concerns. To review them briefly, they are to allow a system substantially
- larger than the current one (if possible of essentially unlimited size), to
- address the policy concerns which are of increasing importance, to allow
- topologies of a much more complex nature, and to retain the flexibility of the
- current system. A concise outline of the goals believed necessary are given in
- the section below.
- These design goals are a very agressive set of requirements. They
- exceed considerably the existing capabilities of other large communication
- systems, such as the telephone system. Two aspects in particular represent a
- major leap. The first such aspect is the attempt to create a ubiquitous
- communication system (and route traffic through it) out of a very large number
- of cooperating autonomous units (although in some senses the road network does
- currently accomplish this goal). The second is the assumption that the system
- will be extremely dynamic; i.e. respond in real time, without human
- intervention, to changes in the topology and other configuration aspects of
- the system.
-
- To meet these goals requires a substantial modification to the basic
- philosophy and mechanism of the existing IP packet layer architecture. Since
- the system is now widely deployed, a change of this sort - an in-place
- modification to a large, operating, system - presents a substantial challenge.
- To meet it it is first necessary to solve the problem of routing, and
- only then turn to addressing. As will be seen, routing (and especially the
- abstraction of data which will be necessary to handle a very large system) is
- a sufficiently difficult problem that any way of making the problem easier is
- desperately needed. Addressing needs to be completely sublimated to the needs
- of the routing architecture; the form of addresses should be whatever is most
- useful to the routing. Dividing the address up in such a way as to ease
- administrative tasks is not correct, although a separate mechanism to handle
- such administrative concerns is possible.
-
- It should be noted that this document is not an engineering design
- document. No detailed mechanisms have been laid out, and undoubtly many
- pitfalls remain along the path to a complete system. Little attempt has been
- made to consider optimizations for common cases and other engineering
- practises; no study has been made of traffic patterns, and of course the cost
- of the actual mechanisms is unknown since the design has not been done; both
- are necessary to guide the cost/benefit choices inherent in good engineering.
- Rather, this document contains a study which attempts to clarify what are
- the fundamentals of the problem, and then identifies what seems to be the
- best path available to solve that problem.
- Also, it was felt that rather than proceed forward along a visible
- path from where we are now, the best method (when considering the long term
- health of the the network) was to act as if a blank piece of paper were
- available. When the best possible design was created, it would at that point
- be appropriate to see whether it was feasible to reach that design from the
- current system. Proceeding from the optimal end point backward seems more
- likely to result in the best long term results than proceeding forward from
- the current state of affairs in an incremental fashion.
- Thus, what this document is an attempt to decide on a general plan of
- attack on the problem; a broad-brush architecture based on a long study of the
- fundamental issues and problems in routing and addressing in a very large
- network.
-
-
- 2 Design Goals
-
- The list of goals for the new architecture are of three types. The
- first are those capabilities that the existing architecture already has, which
- are still needed. The second are things that it cannot do, and which are
- proving to be major hindrances in actual operation. The third are those
- necessities which will be required in the future, and which may not be obvious
- now.
- There are also certain longstanding goals of the IP architecture,
- such as the ability to support mobile hosts, which have previously been
- difficult. It proves possible to support these in the new architecture, as
- will be seen.
-
- 2.1 Current Capabilities
-
- First and foremost, the new architecture must be incrementally
- deployable with respect to the existing system. The capital investment in
- existing software and plant, and the day-to-day operational dependence of many
- people, is so large that to start from scratch would be phenomenally
- expensive, and utterly impractical. If there were no way to do it without a
- fresh start, perhaps it would be practical, but since it seems to not be
- necessary, incremental deployabilty is a must. This does not preclude eventual
- replacement of the existing architecture, obviously; to retain it indefinitely
- would unduly cramp the system.
- In specific terms, a new system should not require any changes to
- hosts to become fully operational (although eventual changes to hosts are
- allowable to take full power from the new scheme); too many unmodifiable
- existing hosts exist, and support for them must be maintained for some time in
- the architecture.
- Second, changes to routers (which are almost certainly unavoidable if
- anything concrete is to be attained) need to be interoperable (for a period,
- at least) with the existing system, although it is reasonable in the case of
- routers to insist that basically all be converted before the new architecture
- can be fully operational. Some feel that even changing all the routers is too
- ambitious, and unnecessary, but it appears that this goal is unneeded, and
- conflicts with the general principle of providing the best possible system
- for the long term.
-
- Next, the current system allows minimal firewalls between AS's; errors
- in routing data from outside the AS cannot (in a properly designed AS)
- interfere with communication inside the AS. While the concept of an AS is
- perhaps outdated, this property is desireable, although it needs to be
- strengthened and made more flexible. As things stand, a single misbehaving AS
- can cause almost limitless problems in the inter-AS routing. A systematic way
- of minimizing such problems needs to be included.
- Finally, it must be vendor-independant; that is, specified in enough
- detail that an implementation by a new vendor, operating solely from the
- specifications, will interoperate correctly with other existing
- implementations.
-
- 2.2 Current Limitations
-
- The things about the current system that are most limiting are the
- limit on the total size of the system, the lack of support for policy routing,
- and the inability to support arbitrary topologies (although this last
- restriction is being lifted by the deployment of BGP).
-
- The size restriction has two phases. First, a key existing routing
- protocol (EGP) has a small limit on the total number of networks, but, as
- with the problems of topology, as BGP is deployed as a replacement, this
- limit will fade.
- The second set (which consist of three far more fundamental problems)
- is that the system is running out of network numbers, the ability to route
- them, and eventually, out of address space completely. This topic is not
- treated here in detail [Chiappa91], but briefly, the first problem involves
- the accelerating use of the limited number of network numbers, the second
- concerns the difficulty of routing in a flat address space which is growing
- (very quickly, to boot), and the third recognizes the eventual limits of the
- 32 bit address.
-
- Policy routing is the phrase used to describe external, administrative
- input to the data routing process. Currently, the system provides little to no
- support for this. The ad hoc mechanisms being used are so complex as to
- confuse all but the most skilled, and contain substantial potential for
- errors. The main problem is that the existing mechanisms are fairly unrelated
- to the desired goals, and the use of these mechanisms to produce the desired
- goals is thus complex and often impossible; they simply do not address
- existing policy requirements.
- Policy requirements can be divided into three areas, which can be
- called a) 'access control', control over which external users are allowed to
- use a given resource; b) the 'trust model', control over which external
- resources a given user is willing to use; and c) 'information hiding',
- control over which external users are allowed knowledge of a resource.
- While these divisions are not necessarily the theoretical minimum, they do
- appear to accurately follow the needs of the majority of users; providing
- these will make configuration of the system to provide the desired policies
- far simpler by virtue of the close match between what is desired and the tools
- available to do it.
- BGP will do little to radically increase capabilities in this area.
- About the only additional capability provided by BGP will be that since
- complete path information to the destination is available, a limited amount of
- additional 'trust model' control will be available, as will some 'access
- control'. However, as the actual controls are still basically those currently
- available, the ills of the existing system remain.
- One thing to note is that while enough policy requirements have been
- verbalized to build a general model, no complete list of what will be required
- is by any means possible at this point. This means that whatever policy system
- is created will have to be able to add new policies as time goes on.
- This latter requirement will be discussed in more detail later. Note
- also that the second policy class (trust model) implies control by the source
- of the route through the network. This also has substantial implications.
-
- The current restriction on interconnection, which prohibits cycles,
- is untenable on reliability grounds (since alternate paths are eliminated) as
- well as difficult to enforce. As noted, this restriction is based on the
- technical limitations of the EGP protocol, and as BGP is deployed as a
- replacement, this limit will fade.
-
- Finally, another problem noted with the current system is the use of
- toll networks. This includes not only the cost of running the routing
- protocols over such networks, but also distributing the information that
- certain paths cost money, and allowing traffic the choice of whether or not to
- use such networks. In some sense this latter requirement is a variant of
- policy routing, but in another it is yet another requirement, often labelled
- 'type of service routing', by which it is meant that different types of
- traffic needs links of different characteristics. For instance, remote login
- traffic needs low delay links, while bulk data transfer needs high bandwidth.
- In the cases of both type of service, and access control, it is also
- necessary to allow for future expansion of the types of each supported, since
- it is unlikely that the complete set can be specified at this time.
-
- 2.3 Future Requirements
-
- In the category of requirements that are not yet restrictive, but
- of which we are becoming aware, one very important one is to provide a
- system that is more immune to problems, of both accidental and deliberate
- cause. The fact that this system will connect effectively everything means
- that great reliance will be placed on it, and equally great confusion will
- be caused should it fail. Engineering failures need to be prevented by
- thorough simulation as well as use of large scale testbeds. Deliberate
- attack needs to be prevented by the inclusion of provisions for security;
- these must provide for the use of one of a (extendable) variety of optional
- security mechanisms, to allow the cost and level of security to be
- optimized depending on the needs and cost of failure.
-
- Another important need is to minimize the amount of configuration
- needed, to minimize the possibility of having conflicting configuration
- information, and to prevent such conflicting information from causing a
- problem. The existing routing architecture suffers from being overly complex
- to configure. While this may not be obvious to technically sophisticated
- people with long association with the network world, many new users find the
- existing architecture confusing and difficult to set up. While they perhaps
- expect too much, since you cannot set up a complex system with little or no
- study and engineering, improvements can be made. In addition, conflicting
- configuration information currently can cause severe problems; this another
- reason why such information needs to be minimized. Finally, the architecture
- must be robust in response the inevitable errors in configuration.
-
-
- 3 Routing and Addressing Fundamentals
-
- After some consideration of the problem, it became obvious that the
- most difficult of the requirements is the size of the system. While some
- approaches make light of this problem, they do so at the cost of restrictions
- which make other stated requirements difficult to meet. The new routing
- architecture had not only to provide capabilities unheard of in the older
- system, but also do so in a very large system. Providing the new capabilities
- proved to be not so difficult, but the size problem was less tractable. It is
- not practical to always provide complete information about every detail of
- the connectivity and routing of the entire to everyone; some way of reducing
- the data was necessary.
- At this stage it is useful to introduce some technical terms.
- 'Compression' is taken to mean reducing one set of routing information to
- another which, while more compact, is effectively the same; i.e. would
- produce the same routing decisions in all cases. 'Thinning' is taken to mean
- discarding some of the data to reduce it to a manageable size. In this case,
- routing decisions taken using the thinned data are possibly non-optimal,
- since some necessary information is unavailable. 'Abstraction' is a term
- including either of these two possible ways of reducing the volume of data.
- 'Attenuation' means that information becomes less complete the further you
- are from the source of the information.
- However, before we examine this part of the problem in detail, it is
- necessary to review some fundamentals of routing and addressing.
-
- 3.1 Routing Algorithms
-
- The family of routing algorithms with the most promise is the
- 'link-state' (LS) group. They are so named because the information that is
- distributed among the nodes running the routing protocol is about the
- existence and state of the connecting links in the system (i.e. a topology map
- of the system) rather than distances to destinations, as in the
- 'distance-vector'(DV) algorithms.
-
- 3.1.1 Distance-Vector Algorithms
-
- In a DV algorithm, each node sends to all of its neighbours a
- complete table of its distance (measured in some agreed upon metric(s)) to
- all the destinations in the system (hence the name). For destinations it can
- reach directly, it advertises some low metric; for destinations it reaches
- through some neighbour, it lists what that neighbour told it plus some
- increment. Each recipient node then compares the advertised distance to the
- one it currently has for that destination; if what it just heard was better,
- it records it and routes traffic to that destination to the neighbour node
- that sent it the update.
- There are many variants, which have been designed to add capabilities
- or fix problems with the algorithm, the most common of which are that DV
- algorithms tend to form temporary routing loops when the topology changes,
- before things stabilize, and that they are unable to detect that a destination
- becomes unreachable until the distance to it passes some arbitrary 'infinity'.
-
- 3.1.2 Link-State Algorithms
-
- In the simplest form, LS algorithms distribute a complete topology
- map to each switching node in the system. This is done quite simply; each
- node generates a message describing the state of the links attached to it,
- and this message is then quickly flooded through all the nodes in the system
- using a reliable flooding protocol. A toplogy map can easily be constructed
- from the messages from all the nodes in the system.
- These nodes may then run in parallel an algorithm which generates
- from this topology map a routing table which can be used to forward packets
- to various destinations. The MILNet is currently using a LS algorithm where
- the algorithm which computes the routing table is one due to Dijksta called
- the 'Shortest Path First'. [Dijkstra]
- However, it should be noticed that the second step, creation of a
- routing table from the topology map, is purely an engineering decision. It is
- not necessary to precompute the entire routing table in an LS algorithm,
- whereas it is in a DV algorithm, since this precomputed routing table is the
- data that is passed on to the neighbour nodes. In an LS algorithm, it is
- perfectly plausible to compute routes on demand, from the topology map, and
- store them in a cache. This reduces the computational burden substantially,
- and, as we shall see, is a key element in the solution.
- Also, in the classic LS routing scheme, the one essential is that all
- the nodes run the same route generating algorithm from a consistent map
- database, otherwise it is possible to form routing loops when two nodes
- disagree about the best 'next hop'. However, if traffic is not routed using
- the 'hop-by-hop' method, both of these requirements can be relaxed; this is
- critical, as will be seen.
-
- 3.2 Routing Algorithm Choice
-
- Several advantages were noted about the LS class of algorithm that
- recommended it to the ARPANet implementors [McQuillan]. Most of these are
- traceable to the fact that since the baseic DV algorithm is a distributed
- algorithm for calculating routes, it is in many senses less robust and
- characterizable (in a response time sense, not an ultimate stability after a
- single change) than the LS algorithm, which runs in parallel locally on all
- machines.
- First, it stabilized quicker after a topology change. This was
- important; since the ARPANet used load-based routing, the metrics on links
- changed fairly frequently. The speed was due to the fact that when a link
- changed state, that fact became know almost instantly everywhere (due to the
- fast flooding), after which the nodes all quickly recomputed their routing
- tables in parallel, at which point the routing had stabilized in the new
- pattern. This is in contrast with DV algorithms, where the effects of a change
- in the topology had to pass through computations in each node on its way
- across the network; additionally, the process might have to be repeated
- several times before the new stable pattern was achieved.
- Second, it detected destinations which had become unreachable more
- quickly; this enabled the switches (which offered a reliable transmission
- service) to discard packets for the dead destination more quickly, avoiding
- buffer congestion from undeliverable and undiscardable traffic in the network.
- Third, it was less likely to form loops (formation of which had been a severe
- problem under the old DV routing algorithm, due to certain ill-advised
- modifications in the algorithm which had promised other gains), and quickly
- broke those which did form.
- These advantages were not particularly useful to this application.
- Since the system was so large, it was felt that practical load-based routing
- was not possible. In such a large system there would be frequent topology
- changes in any case; there were concerns that with the added changes of load
- based routing that the routing would be less stable. The stability is
- certainly useful, but not as crucial when the metrics are essentially static.
- Likewise, the other two advantages, while noteable, can be met with
- modifications to the basic DV algorithms.
-
- The DV algorithms do appear have one advantage, which is that they
- are apparently more suitable to routing in large nets. Since data is
- processed through the routing table before being passed on, they very easily
- can be modified to provide attenutation as a way of abstracting data.
- (The algorithm is quite simple; as soon as the routes to all the subelements
- of a routing element have the same next hop, it is no longer necessary to
- send out routing tables listing each subelement individually; a single
- entry for the element will suffice.)
- However, this advantage is overwhelmed by a disadvantage, which is
- that they cannot easily be enriched to handle the problems of both policy
- based and type of service routing (which are technically very similar) without
- exponential explosion in the amount of data which must be computed and
- transferred. Since route calculation in DV algorithms depends on continuous
- calculation of all possible routes in a distributed computation, handling
- policy considerations means that routes to all destinations under all
- conceivably desirable combinations (or allowed combinations, depending on the
- system design) of policy requirement must be maintained. This is the only way
- to have a route available for a given destination/policy combination when
- traffic for that combination appears (or within any reasonable time of such
- traffic appearing, as explained below).
- (A possible fix for this problem involves only computing a routing
- table entry for given destination/policy combination when traffic for that
- particular combination arrives. Before that traffic could be forwarded,
- however, the distributed algorithm would have to run, probably to
- stabilization (discovering when that happens would be a good trick in and of
- itself), to calculate the next hop at the first hop. This almost certainly
- presents an unacceptably long delay before the traffic can be forwarded.)
- In any case, the distributed nature of the DV algorithm requires far
- more line bandwidth and computational power than the LS algorithm when
- policies are included; in a large network these alone will present substantial
- difficulties. Use of a DV algorithm in a world with policy routing
- requirements is simple infeasible.
-
- A key component of handling the requirements is the realization that
- LS algorithms can handle this problem quite easily. [Chiappa84] Since each
- node has a complete map of the system, inclusing all the links, it is quite
- easy to indicate on each link what 'attributes', of either policy constraints
- or service type, that link possesses. Since the LS algorithms can compute
- routes on demand, it is only necessary to create a route for a particular
- combination of reqirements when it is needed. It is easy, and costs little,
- to tag each link with its attributes before the link information is flooded
- through the system.
- It is this characteristic of LS algorithms, that the actual route
- calculation is separate from the distribution of the data, and can be delayed,
- that is the key one. The fact that the calculation can be delayed (which makes
- handling complex policy routing possible) is a fundamental difference between
- LS and DV algorithms, and a key reason that DV algorithms are fundamentally
- unsuited for use in the new IP routing architecture.
-
- The problem with LS algorithms is that they do require a complete map
- of the toplogy. As explained above, this is impractical, due to the size of
- the target system. However, it is possible to modify LS algorithms to use
- abstraction, so this is the approach chosen. This decision brings other
- problems, though.
- It turns out that compression alone is not an adequate means of
- attack on the problem. There are too many topologies that simply cannot be
- represented by a simpler but equivalent topology. Since the data must be
- reduced to make the problem manageable, it is necessary to discard some, and
- use thinning. The ramification of this step is that when routing data is
- thinned, the lost information means that in many cases the system will fail
- to detect possible routes with the required characteristics. This can lead to
- the creation of non-optimal routes, or in the worst case, failure to find any
- route at all, even though one does in fact exist.
- The hardest problem thus becomes managing the discarding of
- information, based on cost-benfit tradeoffs which the protocol cannot possibly
- comprehend. Simply put, data must be discarded to make the cost of running the
- protocol reasonable. However, taking this step has a cost elsewhere, in
- non-optimal traffic routing; the problem is a classic cost-benfit tradeoff.
- Thus, thinning involves value judgements, and is thus a matter of policy as
- well as a technical problem.
- The ramifications of these issues will be addressed later.
-
- 3.3 Addressing Fundamentals
-
- Before moving on to review the proposed architecture, the final piece
- of background that is necessary is to briefly review the fundamental network
- concepts of addresses, routes, etc.
- Although a number of papers have been written on the subject of
- addresses and routes [Shoch] [Saltzer82], a considerable lack of understanding
- of the basic aspects of these fundamental concepts still remains. Readers need
- to be clear in their minds the difference between fundamental concepts (and
- the objects which are the concrete instantiations of these fundamentals), and
- the different kinds of names (and the reasons for and uses of these names)
- which we may give these objects.
-
- 3.3.1 Object Spaces and Names
-
- There has been confusion as to exactly what the fundamental object
- spaces of networking are; most previous architectures have tried to make do
- with too few. Previous work has also pointed out that common practise in
- networking has been to too tightly bind the forms of names for these objects
- to the objects themselves.
- Two key object spaces used in this architecture are i) the node
- identifier, which indicates one particular computing node to which packets
- may be delivered, and ii) the network address, which specifies an attachment
- point to the network; a network interface to which traffic may be routed. (In
- fact, the term "address" is usually taken to refer to a particular kind of
- name for a network attachment point.) Multicast concepts are not considered
- here, since multicast can be built as a layer on top of this, and is thus less
- fundamental.
-
- Each of these object types may have one or more kinds of names. The
- names may or may not haves structure. Names with structure are invariably
- adopted to make some mechanism (often a mapping from that name to something
- else) easier to implement. For instance, a attachment point may be identified
- either by a unique, fixed length binary flat identifier (e.g. an Ethernet
- interface number in a bridged network), or, as is more usual, a structured
- heirarchical form (e.g. a typical IP address). In addition, there are mappings
- between names and objects, and between different object classes.
- As was mentioned, problems in thinking about the issues of routing and
- addressing can occur when these object spaces are not clearly differentiated,
- or when the form of the name for an object is not separated from the concept
- of the object. An example of the former is the lack of distinction in the IP
- architecture between attachment points and node identifiers; this has left us
- with great difficulties in handling multi-homed hosts and mobile hosts. An
- example of the latter is that a network address is usually taken to be a
- structured heirarchical item, but this is in fact just one possible way of
- naming attachment points; a very useful name, to be sure, but not the only
- one.
-
- Whenever one has different classes of objects, and names for them,
- several questions immediately arise. The first is "do we have the right
- classes", second "what is the mapping between the classes of objects", third
- is "how many names, and of what form, do we have for each class of object",
- and fourth "what is the mapping between the names and objects".
- The mappings form the heart of the power, and the problems. Two
- aphorisms of computer science illustrate this: the first (due to B. Lampson)
- is "All problems in Computer Science can be solved by adding another level of
- indirection"; the second (due to D. Clark) is "The function of an operating
- system is to establish many different names for the same object, so that it
- may busy itself keeping track of the mappings between the various names."
- Do we have the right fundamental objects? Exactly which names and
- mappings are useful in this instance?
-
- 3.3.2 Names and Mappings
-
- It is clear that the two object classes named are highly useful. Are
- there any remaining problems, the solution to which requires yet another
- object class (and associated mappings)? Only one appears possible; the problem
- of heirarchical addresses implying heirarchical routes. This topic cannot be
- treated in full at this point in the document; the architecture as given
- solves this problem, but perhaps not in an efficient enough way. It may be
- necessary to add an extra object class to deal with it. For the moment,
- however, the answer appears to be that the two mentioned are enough.
- As to the mappings among the classes, this also appears simple enough.
- Just as in the real world, a single node may have several network interfaces,
- a single node identifier may map to more than one attachment point. Also, just
- as hosts may move, the mapping from a node identifier to a network attachment
- point may change.
-
- The only namespace for nodes which appears to be useful is a
- relatively short, fixed length, pseudo-flat binary identifier. This name is
- intended for solely for globally unique identifaction of nodes, and for
- efficient processing by automatons; this dictates the form of the name. (Since
- the space will probably be split up to various number assignment authorities
- for each of administration, it only appears to be flat.) Since it appears (for
- reasons to appear below) that the most useful form of the network address is
- long and complex, having at least one name for the destination in the packet
- which is easy to manipulate is desireable. There does not appear to be a use
- for a different kind of node identifier at the moment, but it could be added
- if need be.
- This is obviously in addition to the existing domain names, which are
- heirarchical character string names for hosts. These names are obviously
- intended to be useful to humans, and the structure in them is intended to make
- interpretation by a distributed database (the Domain Name System) easier. It
- is entirely feasible that domain names which represent, for example, services
- could map to more than one node identifier, but that is a different issue.
-
- The names of network attachment points, the address, present a more
- interesting problem. An address usually does two things. First, it uniquely
- identifies a connection point to the network. Second, it does so in a
- structured manner, which allows something useful to be done with it. Exactly
- what that structure is depends on what other uses it is put to; most often the
- structure of the address is used in routing.
- Normally, groups of attachment points which are topologically related
- are given related addresses. Usually, this allows the number of different
- destinations which must be tracked in the various routing devices througout
- the network to be reduced (in its simplest form, this allows routing tables to
- be smaller). That is, rather than keep knowledge of all the individual network
- attachment points, a routing device can keep information on collections of
- network attachment points, at a considerable savings.
- It may have other benefits; in this architecture, it also does several
- other very important things. First, the structured address allows the point on
- the topology graph which the address names to be found easily, without mapping
- through any additional object class, or other attachment point name. Second,
- it helps provides a representation by which topology information can be
- distributed. Third, the structure of the address space provides a framework
- for the abstraction process by which simplified graphs of the topology are
- created.
- Finally, in the current version of this architecture, the address
- (which is heirarchical) in involved in creating those routes which take the
- least amount of effort to compute (sometimes called "no-brainer" routes),
- that route being basically heirarchical. (It is this latter function which may
- prove inefficient to overload onto the address of a network attachment point,
- necessitating the creation of a different object class or naming space for
- network attachment points.)
-
- Note that all these functions are usually performed by a single
- kind of name, a structured address. However, other possibilities have been
- suugested.
- One proposal is that each network attachment point be given a unique
- binary flat identifier; this would allow the creation of multiple,
- independant, structured namespaces (i.e. address spaces), each of which have a
- separate mapping into the first namespace which identifies all attachment
- points. The claimed advantages are that it would be possible to experiment
- with new addressing schemes, or introduce a new one, or even operate with
- several different ones in use at the same time.
- This possibility has been discarded in this architecture, however. The
- whole point of an addressing system is to allow traffic to flow between all
- the myriad parts of the Internet. To do that, it is necessary to pass around
- routing information, and the form in which this is passed around is inevitably
- that of addresses. A new address scheme, resulting in an address form which
- the old routing architecture did not understand, would inevitably interrupt
- this flow.
- Moreover, should it be necessary to supersede the addressing scheme in
- this architecture (if adopted), it would be just as easy to create a new
- mapping from node identifiers to attachment points as it would to create a new
- mapping from attachment point names to attachment points. Alternatively,
- mappings could be created from old attachment point names to new ones
- directly. No good reason can be seen for the flat namespace for attachment
- points, so incurring the cost of it should clearly be avoided.
-
-
- 4 Routing and Addressing Architecture
-
- It is necessary to consider some ramifications of some of the
- requirements in detail, and the steps taken to meet them, before the entire
- architecture can be outlined.
-
- 4.1 Requirement Ramifications
-
- The two major ones have to do with the trust model requirement, and
- the ability to expand the attribute list.
- Briefly stated, the trust model problems is that some users may have
- policies on which links they will and will not use that cannot be described
- in anything less than a general purpose computational notation. Moreover, in
- the most extreme cases, some users may not trust outside agents to pick a
- route for their data, even given a complete statement of their requirements.
- This means that computations in intermediate nodes cannot be used to route
- traffic. This flies in the face of the current architecture, which assumes
- traffic is routed on a hop-by-hop basis; i.e. each switching node makes an
- independant decision on what to do with the traffic.
- The attribute list problem is a more general consequence of the
- specific observation above, in the discussion of the requirements, that it is
- probably not possible to completely specify what policy attributes will be
- needed at the moment. If the new system is to last, it must be able to change
- and expand its capabilities. To do this, it is necessary to be able to invent
- new types of attributes for links, and phase them in incrementally (since
- such changes cannot be coordinated instantaeously across a large system).
- This presents a problem, since this seems to violate the restriction given
- about in the discussion of LS algorithms, namely, that all the nodes run the
- same route generating algorithm. If one node is taking certain attributes on
- links (and requests to use them in packets) into consideration when
- generating routes, and another is not, this can potentially form routing
- loops.
- These two problems appear substantial, yet both can be solved by the
- same mechanism. In addition, that mechanism allows yet more problems to be
- solved, as detailed below.
-
- 4.2 Architectural outline
-
- The general outline of the architecture is as follows (each element
- will be discussed in more detail below).
- The routing architecture will be a multi-level area (i.e.
- heirarchical) LS system; in the topology graph representation of the network,
- links and nodes have attributes. The exact algorithm for generating routes,
- given the topology map, is not specified as part of the protocol (but a
- default algorithm can be provided as an appendix). Abstraction of data will be
- determined primarily by external mechanisms (although a simple default will be
- provided either as part of the protocol or as an appendix), and in any case
- clients are not bound to accept any abstraction unconditionally. Routes will
- be completely determined by the initiator of the traffic; i.e. all packets
- will be source routed. Packets will belong to ephemeral associations called
- 'flows'. (There may be other forces acting on the architecture which make
- flows desirable; if so, several things can be done with this new mechanism.)
- A new kind of address will be created, for use in making routing
- decisions. The existing IP address field will be retained as well (exactly as
- is in the short term), but the contents will be put to slightly different use
- as a node identifier.
- In the short term, hosts will continue to operate with no changes at
- all, and use the existing packet formats for their interactions with routers,
- although there will be certain optional optimizations that speed things up,
- but are not necessary. In the long term, a transition to a longer version of
- the node identifier (the old IP address) will be necessary. A number of
- factors will probably eventually combine to cause the definition of a new IP
- packet format, but most of those forces are outside the scope of this
- document; however, phasing this longer version in will probably occur as part
- of that transition.
-
- A key thing to notice about this architecture is what is not part of
- the architecture and protocol. Neither the algorithm to create routes from
- the topology database, nor the exact method by which abstraction occurs are
- part of the architecture. These are also the two hardest problems, and the
- two in which future work is most likely to bring improvements. It is a key
- feature of this proposed architecture that these two areas can be left
- outside the scope of the protocol; future improvements can be easily phased
- in incrementally.
- Route generation in this architecture will need routing algorithms
- which have a different goal from most existing work, such as the Dijkstra
- algorithm. Most known routing algorithms create an entire routing table of
- optimal routes to all destinations; since this has been what was needed in the
- past, this is the area in which most research has occurred. The problem of
- finding the optimal (or a good approximation thereof, within limits) route
- between two points in a graph, ignoring other destinations, has not been
- a topic of much work. This only increases the need to leave flexibility in
- this area; improvements in algorithms and heuristics tend to be slow to
- appear.
-
- The things which are part of the routing architecture and protocol are
- a way of representing and characterizing the network topology, a way to
- distribute this information, and a way to set up traffic flows. Everything
- else can be left and specified later; a key to the power and flexibility, as
- well as the ability to last (gracefully), of this architecture.
- Upon some reflection this seems likely to be a theoretical minimum
- for performing those tasks. It is not clear whether the fact that the two
- hardest problems are left outside is fortuitous or a reflection of some
- deeper correctness. In any case, the minimal design does limit the amount of
- work to be done, as well as limit the possibility that something will be done
- wrong or in an inflexible way.
- "Perfection has been attained, not when there is nothing left to add,
- but when there is nothing left to take away." [Corbato]
-
- 4.2.1 Multi-level LS Algorithm
-
- The LS class of algorithm is chosen, basically for the reasons listed
- above in the discussion of these algorithms. It is really the only practical
- way to handle a system with complex acccess restrictions on links which form
- a key part of policy considerations. The expandable attibute list allows
- future growth.
- The algorithm has to be multi-level to handle the very large amount of
- link state in the system. Even if the system were restricted to distributing
- information about AS's, the general consensus is that a single level will not
- be able to handle the number of AS's we expect to be deployed in the
- reasonable future.
-
- However, it is also felt that the time and need for AS's, and the
- strict division of routing in inter- and intra-AS, is past. AS's are a limited
- and simplistic mechanism that was created long ago to fulfill certain limited
- goals, primarily that of providing routing firewalls; since the new
- architecture can do all these, and better than the AS concept, it is now
- entirely appropriate to discard AS's. (This does not mean that the concept of
- policy groupings would be discarded; far from it. What it does mean is that
- there will not be a particular layer which is called out for special
- treatment.)
- The primary reason for the introduction of AS's was to provide some
- routing firewalls in the system. Since the new architecture will problem more
- flexible and robust firewalls, retaining the archaic AS mechanism is
- pointless. An additional advantage of AS's (and the division of the task of
- routing between IGP's and EGP's) was to allow different routing technologies
- to be used. The extreme power and flexibility of the new architecture makes
- this less useful, and in any case, the fact that the process by which the
- representation of an area is constructed (the abstraction process) is not in
- the scope of the protocol means it would still be possible to use other
- routing technology within an area, and pass the results of that process up as
- the representation of the topology of the area.
- The new system would be responsible for all routing in the IP
- architecture; it would be able to depict, at its lowest level, the physical
- transmission assets of the system. This unification of routing would not only
- reduce the complexity (and likelihood of problems, implementation and
- configuration cost, etc), but bring the full power and flexibility of the
- new architecture to bear at all levels. A number of problems exist today
- because of the split between inter and intra-AS routing; removal of this
- artificial barrier allows greater flexibility.
-
- The use of a link-state architecture does bring with it the attendant
- difficulties of how to abstract link-state data. The use of thinning,
- necessary for useful abstraction, brings with it the difficult task of chosing
- which data to discard, and the problems caused by loss of the discarded
- information. As noted above, this cannot be solved by purely technical means;
- value judgements must be made about which information to discard. It will be
- usually be necessary for the administrators of any given area to help decide
- what the representation of that area will be when the internal details are
- hidden.
- Since this is now a problem of entering configuration information, it
- is desireable to do this in a way that allows no chance for inconsistent data.
- A default algorithm will also be provided, for those who do not care what the
- representation of their area is, or are unsophisticated and cannot participate
- in chosing the representation of their area. The fact that there is no
- specified algorithm for doing this also allows improved default algorithms to
- be incrementally deployed as they are conceived.
- However, the problem of non-optimal or unfound routes remains. The
- solution to this part of the puzzle requires the next key component.
-
- 4.2.2 Source Routing
-
- The source routing is a necessary consequence of both the trust model
- part of policy considerations, and the expandable attribute list, as outlined
- above. Having the source set the route avoids both of these difficulties. As
- mentioned, it also allows the incremental deployment of better algorithms for
- creating routes as ongoing research provides them, and also provides a means
- to attack the non-optimal route problem, as noted immediately above.
-
- Note that the route (nor indeed, the new style address) is not
- necessarily carried in each and every packet; this is an engineering decision
- to be made on the basis of the costs and benefits of doing so. The costs are
- primarily the increased overheard of carrying around the extra data. There is
- another issue in that the existing IP packet format will probably not allow
- 'on the fly' modification to hold the source route, since the latter will
- probably be too large to fit into the IP header. Depending on the length of
- the addresses, these could be a problem as well. This could be solved by
- 'wrapping' the packets, but at considerable cost in complexity and switching
- speed (and some in line utilization).
- The benfits of carrying the source route and/or address in each packet
- are that the routers may remain more stateless. It is anticipated that routes
- will not be carried in packets, and there will be a 'route set-up', as part of
- the 'flow set-up', which happens invisbly to the client of the packet layer.
- This flow will not be critical state, like a connection, but rather a hint. (A
- 'hint' is data which allows processing to be optimized, but which is does not
- have to be present to allow correct operation.) This may be recreated by the
- routers without reference to the source of the flow setup should any
- intervening node fail.
-
- Obviously, since the source (or an agent close to it) choses the
- entire route, it is trivial for the source to enforce whatever constraints it
- wishes on the path that its traffic takes (even constraints which are not
- stateable in anything less than a general purpose computational language).
- Alternatively, if the agent computes routes using links with attributes it
- understands, but which the intervening switching nodes do not, the traffic
- will be routing correctly, their lack of comprehension notwithstanding.
- Also, if in computing a route, the agent generating the route comes
- across an attribute it does not understand (either because it was recently
- introduced, and the agent does not yet know how to deal with it, or because
- it is a private attribute which the agent will never understand), the
- agent will still be able to compute a route. If all attributes carry some
- limited information about whether they are restrictive (certain types of
- traffic will not be allowed across) or informational (giving information
- about the link) it would be possible for a route to use links which have
- attributes the agent does not understand.
-
- Since the entire route is calculated by one agent, it clearly does
- not matter if different agents use different algorithms for generating
- routes. Deployment of new algorithms, or a choice between several existing
- ones (for example, one which is fast but which does not produce good routes
- would be useful for one-shot traffic, whereas more complex and expensive
- ones would be appropriate for long-lived flows sending a lot of data)
- thus becomes trivial. [SalzterSR]
-
- 4.2.3 Local Abstraction Control
-
- As for the non-optimal route problem, it was noted above that the use
- of hop-by-hop routing required both a consistent algorithm and a consistent
- database. Discarding hop-by-hop routing allows relaxation of the latter
- requirement as well. This allows a powerful (and previously unconsidered)
- adaption of the typical operation of an LS algorithm. Canonical LS algorithms
- require everyone to have an equivalent map.
- ("Equivalent" is used instead of "identical" since in existing
- multi-level protocols such as OSPF, agents in different areas will each have
- (identical) top level maps, as well as maps of their own area, but of course
- these latter maps are different. All routers within a single area, however,
- will (and must) have identical maps.)
-
- However, there is no requirement in this architecture that this be so.
- Neighbouring routers at the same level may have wildly different maps. One
- might have only a default representation of a neighbouring area; another,
- which is sending a lot of traffic into that area, might have detail on the
- internal connectivity of that area, to allow it to pick the optimal entry
- router into that area for the traffic it is handling. Any node may call for
- more detailed topology information about any part of the system it wants,
- provided that that information is accessible to it, and the cost of providing
- that information are borne somehow.
- However, the implications of lifting this restriction are
- considerable. As noted above, the thinning problem is essentially a
- cost-benfit tradeoff; making this tradeoff at the area which is being
- described imposes the area's idea of the correct cost-benefit tradeoff on
- everyone. However, allowing each route generator to make its own decision on
- how much detail to keep in its maps allows this cost-benfit tradeoff to be
- made at the client, i.e. with far greater flexibility.
- It will be necessary for the costs of keeping maps with more detail
- to be shifted to those agents which wish to keep them; this will require
- attention at the time that accounting and resource usage algorithms are
- brought to the network.
- The default abstraction algorithm is thus seen as that which will
- provide the theoretical minimum of data necessary to route traffic through the
- system. The actual routes taken by traffic under this algorithm will thus be
- strictly heirarchical. Looked at another way, local abstraction control thus
- allows a way to do non-heirarchical routing.
-
-
- 5 Details of the Architecture
-
- Many of the components necessary to this routing architure and
- protocol can be 'borrowed' from the work being done on the Open Routing
- system, and similar LS systems such as the ARPANet algorithm, OSPF [Moy], etc.
- These include the mechanisms to do reliable database distribution and flow
- setup. According, description of these solved problems will be skipped in this
- document.
- This section includes more detail on certain key parts of the
- architecture, along with calling attention to those parts in which engineering
- questions remain.
-
- 5.1 Multi-Level Topological Representation
-
- The first step is to consider the maps (graphs, actually) which
- represent the actual detailed topology. It is worth noting that in the graph
- representation of typical networks, there are usually two kinds of nodes;
- 'network' nodes and 'router' nodes. The arcs then represent network
- interfaces. It is possible to have only a single kind of node (either
- networks or routers), but this is artificial, loses valuable information, and
- is more difficult to work with in practice than the more complex version.
- This architecture will almost certainly use this type of model.
- In the maps, it will be possible to take an contiguous section of the
- graph (i.e. some number of nodes and their connecting arcs), and reduce it to
- a simpler representation. Such a reduced section is called an 'area'. Areas
- may be nested, and areas have as an attribute a level; an area which contains
- another area has a level one higher than the contained area.
- It is unclear whether an area will need to be represented as a section
- of graph, or just a node. In either case, a new type of node for an area is
- probably needed. The issue probably hangs on whether the rules for
- constructing topologies demand router nodes between area nodes or not. In the
- former case, the minimum representation of an area would be the collection of
- border routers plus a pseudo-node for all the rest of the topology of the
- area, to which the border routers are connected. If the area has complex
- transit policies which vary depending on which pairwise set of border routers
- are picked, a more complex representation allowing this to be expressed would
- of course be necessary.
-
- It is not clear whether it is best to allow only 'strict' reduction,
- in which an area of level k would be allowed to contain only obects of level
- k-1, or whether 'loose' reduction, in which objects of varying levels less
- than k could be contained, is preferable. The latter is clearly more flexible,
- but might make the engineering more complex. This decision can probably be put
- off until the detail design happens. What is important is to realize that the
- division into areas provides the framework for the default abstraction
- algorithm; agents outside an area will not in general have detailed
- information on the internal topology of the area.
-
- On the subject of border routers, it seems wisest to set the rule
- that all routers belong to a single area. This is a good match to reality,
- where most of the physical assets have an owner; the half-router model has
- proven to be less useful in practise.
- One aspect of representation that needs to be addressed is the
- question of configuration errors; this is the most likely place for such
- errors. One technique that appears useful is to simply indicate to each router
- whether or not it is a boundary router for a k level area, not which k level
- area it belongs to [BBN]. Thus, misconfiguration of a router simply joins two
- k-level areas togther, but has no other ill effects on the routing. Similar
- approaches must be taken throughout this aspect to minimize configuration
- and prevent configuration errors from being a major problem.
-
- In any case, it is clear that some decisions about open questions
- need to be taken about the multi-level topology representation before any
- other parts of the design can be worked on in detail. Obviously, these
- decisions may need to be revisited if work elsewhere reveals that a poor
- choice has made the design of some other part more complex than need be.
-
- 5.2 Multi-Level Addresses
-
- The address format chosen will need to map closely onto the
- representation of network topology, which in turn is chose to make the job of
- the routing simple. Remember, the address is simply a way of naming an
- attachment point to the network, in other words, specifying a place in the
- graph, and it is essential that given an address, it must be easy to correlate
- that address with its location in the topology graph easily. (It might be
- argued that the 'address' need not have that property, but it might map to
- something else which does. All that is saying is that terminology has been
- confused, and the wrong name for a network attachment point has been called
- the address. The final concept will inevitably have to be one which is easy to
- locate.)
- In essence, what is proposed is a multi-level heirarchy, with a
- variable number of levels, each of variable length. This may seem like
- overkill, but remember that these addresses are only needed to create flows,
- and since the cost is minimal it is better to have too much flexibility than
- too little. The mapping from addresses to the map is fairly simple; the first
- element of the address identifies which top level area an address is in, the
- next element which of the next level areas within that top level area, and so
- on.
-
- In any event, the bottom level will be the level representing real
- physical assets, and numbering will proceed up from there. The advantage of
- this is that two different systems with differing address spaces, each
- allocated without reference to the other, can be joined by the simple
- expedient of creating a new level above the top of each existing system and
- making each system appear as a simple object in that level. [Reed]
- Alternatively, a small independantly numbered system could be joined onto a
- large existing system as a branch at the appropriate level.
- This will also avoid useless arguments about whether or not something
- deserves to be a 'top-level' area or not. Since there is no such thing as a
- top-level area, pointless fights over status should mostly be avoidable. Note
- that good functioning of the resulting system demands proper engineering of
- the topology, and of the areas; since the area definitions provide the
- framework for the abstraction process, which in turn affects the default
- heirarchical routing, a poor choice of area boundaries will cause the default
- abstraction process, and the default heirarchical routing, to work less well,
- although it will continue to function.
-
- The analogue in the adressing area to the question of only allowing
- 'strict' areas is whether or not networks (and presumably hosts and routers
- as well) can have numbers at any but the lowest level. If a k level area is
- allowed to contain physical assets, presumably they can be addressed with
- k-1 level addresses, without going all the way down to the bottom level.
- The question of exactly what form the addresses of networks and
- routers (if any) take is open. In the current architecture, no real provision
- was made for numbering networks, although the convention of a zero 'rest'
- field has come to mean the network. Routers are currently indistinguisable
- from hosts (which is good and bad). In the new systems, hosts will not have
- addresses per se; the lowest level item which can be addressed is a network
- interface, and presumably typical hosts will have one each.
-
- Once again, choices have to be made, but with the understanding that
- they may have to be revisited in light of work elsewhere on the architecture.
-
- 5.3 Topology Information Flooding and Route Generation
-
- It is next necessary to consider the question of which devices perform
- these functions. It seems most plausible that the first function is co-located
- in the routers themselves, for reasons of fate-sharing; connectivity
- information is flooded through the very connection elements that are being
- described. This also removes dependency loops where topology flooding agents
- which are not routers are required to make contact with neighbour topology
- flooding agents through routers.
- Route generation could, however, quite easily be performed in separate
- devices, which communicate with routers (both local, and distant if they wish
- to have detailed information about distant areas). It might be useful to build
- this function into routers, but this is an implementation decision. Certainly,
- clients with complex policy requirements will probably desire to generate
- their own routes.
-
- 5.4 The Default Abstraction Algorithm and Local Abstraction Control
-
- Given the discussion above, the default abstraction algorithm is very
- simple. As stated above, the division into areas provides the framework for
- the default abstraction algorithm. The routers which are border routers for an
- area are responsible for doing the abstraction on the area topology when it is
- advertised outside the area.
- Two possibilities exist for a simple algorithm to provide a default
- representation of an area. One, the N^2 model, maps the area as a complete set
- of connections between all the border routers. This model can fairly
- accurately represent the true characteristics of the connection between any
- two pairs of routers. (Of course, how to do so in the presence of policies
- and types of service, where a multitude of possibile connections are possible,
- will require a bit of work; probably the least policy-limited path should be
- advertized.) In the other, the pseudo-node model, all the border routers are
- connected to a single node in the center. This cannot accurately represent the
- metrics between any two border routers, but an averaging system will give a
- good guess.
- The option is always available to go to a representation with human
- input, or to a representation prepared by some other algorithm. Some control
- needs to be created to prevent areas from flooding the system with
- unnecessarily complex representations of themselves.
- Of course, if an area is represented solely by a single pseudo-node,
- none of these problems arise, which is the attraction of this possible way of
- representing an area. The disadvantage is that then areas must have transit
- attributes which mirror the attributes (both policy and type of service) on
- the actual transmission links which connect the border routers.
-
- The working of the local abstraction control (somewhat inelegantly
- named the "Blister Model") is simple to understand in this context. Generally,
- any routing agent outside an area is given, as a default, whatever
- representation that area has decided to purvey. However, should a routing
- agent wish a more detail of a non-local part of the map, nothing is to stop it
- (other than information-hiding access control, of course) getting the
- information and upgrading its map.
-
- 5.5 Virtual Links and Flow Repair
-
- One concept which will probably appear as part of the representation
- of an area is that of a virtual link. Where it is desired to describe the
- connection between two border routers, without providing detail on the
- constituent links which actually connect the two, a virtual link (with
- the attributes of the complete path between the two) could be advertised.
- One key aspect of this idea is the ability to do local repair on flows
- when topology changes happen. If a virtual link at level k, which was
- advertised to a source route generator and which is used in creating a flow,
- suffers a component failure, two outcomes are possible. First, the virtual
- link may be reconstituted with the same attributes using other lower level
- assets. In this case, the repair would be local, with no necessity for
- intervention on the part of the source route creator. As an option it might be
- notified if some attribute, such as delay, increased more than an allowable
- amount on some component it used in setting up a flow. Second, if such a
- reconstitution cannot be made, the k+1 level virtual link of which the k level
- link is a constituent is notified, and the algorithm is then run recursively
- at a higher level. Only if some asset which the source route creator fails
- would it be necessary for the creator to select a new path for the flow.
-
- 5.6 Partitioned Areas
-
- Handling of partitioned areas needs to be addressed. A number of
- methods are possible. One obvious method is to reconnect the two parts of the
- partitioned area by going through another area.
- Another elegant method gives each level 1 area a globally unique
- identifier. A k level area is given the identifier (at k level) of the lowest
- (or highest, or some unique algorithm) numbered k-1 level area within it.
- Then, if a k level area partitions, it automatically creates a new k level
- area, which is automatically numbered from its constituent k-1 level areas.
- The disadvantage of this is that the address of everything within the new k
- level area has changed, which is probably a substantial disadvantage (although
- not insuperable).
-
- 5.7 The Node ID
-
- The node identifier (what used to be the IP address) is a key part of
- the system; not only does it provide new capabilities, but it makes the new
- system incrementally deployable. Instead of being a number with structure, as
- it is currently (which is causing inefficient use of the number space), it is
- simply a unique 32-bit (initially) number. That way, we put off the day when
- that address space runs out, since we can use it efficiently instead of
- wasting large gaps, as we do now.
- A number of outstanding issues, particularly mobile hosts and
- multi-homed hosts, can be solved with this, with no change to the upper
- levels. An open TCP connection to a given node identifier will stay open even
- if the node moves and gets a new address. (The flow may need to be rehomed,
- but this will be invisible to the TCP, and perhaps to the host entirely.) If
- an explicit mapping exists from a node identifier to multiple addresses, many
- issues having to do with multi-homed hosts (both for reliability as well as
- bandwidth) become much simpler.
- The source and destination of packets in the network will continue to
- be identified by these identifiers. If we chose not to include the new
- addresses in each packet (as seems likely), a packet arriving at a router is
- assigned to a particular flow by inspecting these numbers. This process will
- be at least as quick as current routing, and if the addresses are not included
- in the packets, there will be no extra line overhead either.
-
- 5.8 Mapping to the Address, from the Name and from the Node ID
-
- Obviously, a system for providing the new addresses, and mapping
- between them and other identifiers, such as host names and identifiers, needs
- to be available. It clearly ought to be distributed and replicated. We already
- have a distributed replicated system for containing mappings; the DNS. It
- would probably be simple to add this mapping as well.
- When going from the string name, the obvious thing is to return the
- new address as well. For clients which need to map from node identifiers to
- addresses, some analogue of IN-ADDR will have to be provided.
- One thing to be careful of is that in the process no dependency loops
- are formed. For instance, error reporting might need to get an address if all
- that is on hand is a packet with the node identifier. The actual process of
- getting routing information around would have to be examined carefully to make
- sure it does not rely on the existence of this mapping system, otherwise
- dependency loops will certainly form.
-
- 5.9 A Sample Route Generation Algorithm
-
- One simple way of generating a route in a complex policy environment
- is to run over the map, dropping all links which are unusable for policy
- reasons or because they do not match the type of service desired. It would
- then be possible to run the Dijkstra algorithm to create a routing tree, and
- thus a route, for a given metric which is it desired to minimize (probably
- delay or cost). To optimize several metrics at once, a formula for weighing
- the metrics together to create a composite metric needs to be provided; as
- each link is examined in the Dijkstra, the composite metric will be calculated
- from the exact metrics of the link, and the composite metric will be what the
- Dijksta minimizes.
- Metrics such as bandwidth, or MTU, where minimization over a path is
- inappropriate, would be handled by dropping inappropriate links in the first
- phase; if no route can be found, the actual requirement on this metric could
- be lowered and the process repeated to see if a route appears when a less
- optimal value is allowed.
-
- 5.10 Authentication and Robustness
-
- Making the system robust against attack is vitally necessary. One
- approach would be to tag all data with a private key system; this guarantees
- that topology information could not be modified as it is flooded through
- the system. The same approach could be used in flow setup; acknowledgements
- tagged with the private key would prevent the flow from being hijacked to
- a different destination by a corrupted switch (although it could be copied).
- Protection against denial of service attacks are more difficult. In
- the above example, the corrupted switch could prevent the flow from being
- set up, although it cannot make it appear that the flow has been correctly
- set up when it cannot. Of course, if the switch refuses to set up the flow,
- an alternate path could be picked which avoids that switch.
- Another difficult problem is bogus connectivity information. Some
- checks can be made, such as ensuring that both ends of a link agree that it
- exists, but this will still not catch collusion. Once again, it can be
- detected that this is happening if the flow does not set up correctly, and the
- errant switches avoided. However, if the switches claim a direct connection,
- which causes traffic to flow to them, and then route the traffic through,
- it will be difficult to detect except by measuring the service provided.
-
-
- 6 Possible Optimizations
-
- Although this document is not an engineering design, there are a
- number of possible optimizations which are worth discussing here. Many other
- such optimizations are possible. They have not been considered in detail here,
- since they are generally low level engineering and depend on the exact details
- of the structure chosen. The key focus in this document is to discern the hard
- problems, and pick a broad line of attack on them.
-
- 6.1 Default Routing Tables
-
- One useful optimization which might be included is to provide a
- facility for handling traffic without the overhead of calculating a route
- and setting up a flow. [Estrin] Needless to say, this could only be done
- for a limited number of classes of traffic, but it is possible. Basically,
- a way would exist to indicate that traffic was not part of any flow, and
- one (or a small number) of default routing tables for such traffic would be
- precomputed, in all routers, as is the current practise.
- This would not be unreasonably expensive. For example, assume that
- the system contains 5 levels, and the fan-out at each level is on the order
- of 200. Thus, the system could handle up to 200^5 networks, or 3x10^10
- networks; this should be large enough for the reasonable future. Such a
- system would only mean that the average router with a minimal map would
- need a map with 5*200*(3+1), or 4000, nodes (assuming that routers outnumber
- networks/areas by a 3:1 ratio, which seems about right from the current
- system; this gives triple connectivity to each network, or fairly redundant
- connectivity). Assuming each router is connected to 3 networks, then the
- map would contain 5*200*(3*3), or 9000, arcs. Such a map would be fairly
- easily handlable by even the current generation of router hardware.
-
- The problem with this idea (within this architecture) is that a
- mapping must still be created from the destination node identifier to the
- address. How does this mapping happen? One possibility is to have to have
- first hop router (or the host) add it to the packet. Alternatively, each
- router along the path would have to do the mapping. Clearly, the mapping in
- each intermediate router could be installed via a setup, but this would just
- get us exactly what we are trying to avoid!
- Also, it is not clear if this optimizations will be useful in the
- future. It remains to be see whether if, in a system in which policy
- configuration and access controls are much easier, and thus more common, much
- traffic is sent out in the default class, and if the free, general access
- model of network use we have enjoyed continues. If not, the concept of default
- routing tables is probably not a useful one, since almost all traffic would
- need a flow set up to handle it.
-
- 6.2 Map Discarding in Stub Areas
-
- One additional idea, if default routing tables are included, is to
- support stub areas. The complaint might be raised that even supporting a
- map with 4000 nodes (from the example above) is pointless if an area is
- only singly connected to the outside world, or neither wants to support
- transit traffic nor pick optimal area exit routers. In these cases having
- the topology map for the rest of the system outside the area is
- unnecessary; the traffic is simply sent to any border router, and is routed
- from there.
- (The former simplification is obvious; doing either of the latter
- two means that traffic must be able to differentiate as to which border
- router is the best one to head toward; i.e. traffic inside the system must
- have access to a full outside map to know which border router is the best
- exit router, if routing loops are to be prevented.)
-
- 6.3 Flexible "No-brainer" Routes
-
- One major potential issue with the current architecture is the fact
- that the simplistic "no-brainer" route is that which is computed with the
- minimal map, and that minimal map implies strict heirarchical routing.
- The problem with that is that a K level area, such as a campus, which
- in actual physical terms has connections to several long-haul networks, each
- of which is probably a separate K+1 level area, has an insufferable choice to
- make. Depending on which K+1 level area they associate themselves with, any
- traffic using the simple "no-brainer" routing will get to that K level area by
- means of the long-haul network with which the K level area is associated in
- its global address. This is the problem referred to some while above, where it
- was indicated that perhaps using the address to generate the "no-brainer"
- route is an overload of that name which finally breaks down.
-
- One solution to this problem involves the use of what are called
- "route suffixes" [Clark]. When the host-name to address translation is
- performed, the address which comes back comes with several incomplete source
- routes, which represent potential useful paths, terminating at the
- destination, through the nearby topology. If the most detailed (last) item in
- the route suffix is not know to the source, it backs up and tries the previous
- (less detailed) item, and so on. Given several route suffixes, probably one
- for each major long-haul network which an area is attached to, the source can
- easily make a determination as to which long-haul network it prefers to
- use to get to the destination.
- The major advantage of this scheme is that is it fairly efficient; a
- small amount of extra data is passed back at the time of the address lookup
- which allows a choice to be easily made. What mechanisms are available in this
- architecture to do this?
-
- One rough equivalent in this architecture can easily be found by
- allowing each network attachment point to have multiple addresses; this would
- indicate that a destination could be reached via multiple long-haul networks.
- (This is equivalent to allowing K-level areas to overlap.) This solution is
- not preferred since it conflicts with a previous goal of the address, which is
- to provide a way of concisely describing the topology. If any network
- attachment point can have many names (addresses), this will be more difficult.
- In effect, this approach overloads yet one more function onto the
- address, which is to optimize picking a slightly more optimal route; the
- address is not really suited to having to support this one further function,
- however. There are a number of ways to do this which do fit well within the
- architecture, though.
-
- The first is a slightly more general variant of the original scheme,
- which is to use the local abstraction control mechanism to discover some
- details of the topology around the destination. This is seems to be less
- efficient than the "route suffix" mechanism, but has the same effect; the
- source is given more information which allows it to make a choice.
- The second, in some senses a variant of the first, is to remove from
- the map all areas which do not allow transit traffic; i.e. "stub" areas. This
- would greatly diminish the number of nodes in the map, and allow more detail
- to be kept. Whenever a source needed to send traffic to a destination in that
- stub area, it could pick up topology information about that area and enter
- it into the map.
-
- Both of these represent a slightly more formal way of doing
- essentially what the route suffix does, which is provide more routing
- information specific to the destination. The difference is that the
- information in the route suffix is picked by the destination, as opposed to
- the source being given general information about the topology near the
- destination.
- One final option is to provide essentially exactly the route suffix
- mechanism. As alluded to above in the discussion of fundamental object
- classes, it might be necessary, if this optimization is deemed important, to
- provide a new object class, that of route suffix. Rather than overload this
- function onto any of the existing object classes (and the names for those
- objects), it is perhaps best to simply bit the bullet and create the new
- object class.
-
-
- 7 Deployment
-
- A great advantage of this architecture is that it can be deployed in
- an incremental fashion, and furthermore that it does not require any changes
- in hosts for fairly full operation (although minor changes would improve
- things somewhat).
- The following section is a first crack at defining how the deployment
- would work.
-
- 7.1 Incremental Deployment in the Routers
-
- The system can fairly obviously be interoperated with the existing
- architecture provided host identifiers continue to be allocated along the
- existing lines of network numbers. This will allow the system to be deployed
- and brought up incrementally in the routers.
- In an old-style router, it will still be given lists of network
- numbers, and the metric to each. In new-style routers, areas of the network
- which are being serviced by old-style routers will be masked by 'transitional'
- routers which provide a mapping from the old routing protocol to the new
- system.
- Of course, as the number of networks mounts, the existing routing
- will break down, and render the old-style routers non-functional. By that
- time, basically all the main routers should have been converted to use the
- new system, and the old routing mechanisms can be turned off. Stub routers
- which use default only can continue to operate as before, obviously. Once the
- old-style routing mechanisms are turned off, allocation of host identifiers
- can proceed in a more flexible and space efficient form.
- Obviously, old-style routing could continue to be used once this
- happens, but hosts served by old-style routers would probably not be
- able to get access to the new hosts, since the new host identifiers would
- not contain any topology information the old-style routers could use to
- route the traffic. One possible way around this is to have all traffic
- to what looks like new network numbers send to a new router which would
- perform the mapping in a similar way to the way hosts are supported.
-
- 7.2 Incremental Deployment in the Hosts
-
- For the first stage, no changes would be absolutely necessary in the
- hosts at all. The host would translate from the name to the host identifier,
- and send out an existing packet to the first hop router.
- The first hop router would then do the flow setup, using some default
- policy attributes. The destination network address would either be looked up
- separately, with attendant overhead, or, if the router overheard the name
- lookup, it might have cached it.
-
- The issue of network masks and node identifiers needs to be thought
- about to make sure that getting from hosts to neighbouring hosts, as well as
- first hop routers, continues to work.
- Clearly, if node identifiers are assigned without respect to topology,
- then the mask and match process can no longer be used to determine whether or
- not a given destination is on the local wire. Doing anything else means that
- the address space will not be allocated inefficiently, bringing us back to
- where we were! However, given the practical difficulties involved in true
- random allocation of node identifiers, it may be necessary to allocate in
- blocks.
- One issue is involved in the translation of node identifiers to local
- hardware addresses. Some possibilities exist involving use of ARP, but these
- are nasty. For networks which depend on direct mapping from the node
- identifier to the network address, it is obviously unavoidable to allocate
- chunks of the node identifier space.
- Another problem involves making sure that each host understands how to
- get to its first hop router for off-network traffic. If hosts exactly follow
- the Host Requirement RFC, and are set up with an all ones subnet mask, then
- any attempt to send traffic to another host will perforce go to a router. The
- default router table will have been filled in via the Router Discovery ICMP
- mechanism, and the router's local network address will be resolved as
- discussed above. This is inefficient for traffic on the local network,
- however; if the all ones mask approach is used, routers will also have to stop
- sending Redirects!
- If a proposed document on routing in the host is worked on (or
- alternatively a revision to the Host Requirement RFC is put in hand), this
- topic should be carefully considered to make sure the proposed mechanism will
- integrate well into this new approach.
-
- Eventually, the process could be optimized by slight changes in the
- host. It would ask for and remember the new type address when it contacted the
- name server; if it had special policy requirements (or charge authorizations,
- or whatever) it would include these in the packet it sends out, either in the
- initial connection packet or in a special communication to the first hop
- router, or whichever box is doing the flow setup.
-
- 7.3 Upgrading to a Long Node ID
-
- A 32-bit system wide node identifier is clearly unsuitable in the
- long run. One possibility for the latter is to make the UID's locally
- (perhaps pairwise locally) significant only; however, this is complex, and
- loses some of the advantages of having a node idenfitier.
- The other main possibility is a new IP packet format with 64 bit
- UID's. A new packet format might be desirable anyway, for other reasons. If
- the 64 bit UID path is chosen, and the phaseover started before the 32 bit
- space is used (so that the UID of any node in the new 64 bit space is simply
- the zero extension of its UID in the 32 bit space), there will not be any
- cases of new and old style hosts which cannot communicate due to the inabilty
- of the old address space to name hosts in the new space, which is the usual
- cause of problems in conversions. Once basically all hosts have been
- converted to the new format, use of the rest of the identifier space can
- proceed. This is probably the preferable path.
-
- A number of ways of interoperating old and new style hosts exist.
- The first hop router might automatically convert old-style packets to new
- style. The advantage of this is that hosts can throw out the old code; the
- disadvantages are the cost of the conversion to and fro when two old-style
- hosts are conversing. Another possibility is that hosts might continue to
- understand both old and new.
- In any case, there are four cases to be considered; old-old, old-new,
- new-old, and new-new (the first being the source and the second the
- destination of the packets). The tricky one is the new-old, since in any
- system something must convert the format; either the source host needs to
- find out that the host does not understand new-style (perhaps by trying
- a new-style first and seeing if it gets a response) and send an old-style,
- or the last-hop router, which needs to know which of its hosts are new and
- which are old, needs to do the conversion. A messy problem however it is
- sliced! One useful trick is to use spare bits in the headers to record
- packets which were sourced in the other format, or hosts which can generate
- and understand new format packets, etc.
-
-
- 8 Conclusion
-
- The system proposed above consists of many elements, with the utility
- of each created and increased by the interaction with others. The way in which
- they were put together was not as straightforward as it now appears. A long
- process of revisiting of requirements and tinkering with the design has
- resulted in the complete structure now laid out, with each piece dependant on
- others and part of a deeply integrated whole.
- The system also contains a number of old ideas, put together in a
- novel way, together with the addition of some new ideas. In both of these
- aspects (the extreme interdependence of a limited set of ideas, developed over
- time, together with the resuse of existing good ideas with some augmentation)
- it resembles the process which created Unix. [Thompson]
-
- It is believed that the proposed architecture detailed above meets all
- the goals outlined in the requirements, and in particular the size, making
- possible a extremely long (essentially indefinite) use of the IP architecture.
- Finally, it is worth noting again the many instances here in which
- the power and flexibility of the architecture are improved by leaving
- something out (route generation, data abstraction, etc), rather than by
- adding something.
-
-
-
-
-
-
- References (Incomplete)
-
-
- [BBN] ???; "Adding Areas to the ARPANet Routing Algorithm??";
- 1982??.
-
-
- [Chiappa84] J. Noel Chiappa; "??"; 'dev-cgw' mailing list; 1984?.
-
-
- [Chiappa91] J. Noel Chiappa; "The IP Addressing Issue"; Internet-Draft
- Yyyyyy 1991.
-
-
- [Clark]
-
-
- [Cohen] Danny Cohen; "On Names, Addresses and Routings"; IEN 23;
- January 1978.
-
-
- [Corbato] Fernado Corbato and Jerome H. Saltzer; "Multics: The
- First Seven Years?"; ??
-
-
- [Dijkstra] Edgar W. Dijkstra; "A Note on Two Problems in Connection with
- Graphs"; Numer. Math, Vol. 1, pp. 269-271; 1959.
-
-
- [Estrin] Deborah Estrin, Yakov rekhter and Steve Hotz, "A Unified
- Approach to Inter-Domain Routing"; In preparation.
-
-
- [McQuillan] John M. McQuillan, Isaac Richer and Eric C. Rosen; "ARPANet
- Routing Algorithm Improvments"; Bolt, Beranek and Newman Report No. 3803;
- April 1978.
-
-
- [Little] M. Little; "Goals and Functional Requirements for Inter-
- Autonomous System Routing"; RFC1126; October 1989.
-
-
- [Moy] John Moy; "The OSPF Protocol"; RFCXXXX; Yyy 1991
-
-
- [Reed] David Reed; "??"; CSR Group Talk; 1979??
-
-
- [SaltzerSR] Jerome H. Saltzer; "Source Routing?"; ??
-
-
- [Saltzer82] Jerome H. Saltzer; "On the Naming and Binding of Network
- Destinations"; Local Computer Networks, North-Holland Publishing; 1982.
-
-
- [Shoch] John F. Shoch; "Inter-Network Naming, Addressing and Routing";
- IEN 19; January 1978.
-
-
- [Thompson] Dennis M. Ritchie and Ken Thompson; "The UNIX Timesharing
- System"; CACM, Vol. 17, No. 7, pp. 365-375; July 1974.
-